18 research outputs found

    SPOTS: Stable Placement of Objects with Reasoning in Semi-Autonomous Teleoperation Systems

    Full text link
    Pick-and-place is one of the fundamental tasks in robotics research. However, the attention has been mostly focused on the ``pick'' task, leaving the ``place'' task relatively unexplored. In this paper, we address the problem of placing objects in the context of a teleoperation framework. Particularly, we focus on two aspects of the place task: stability robustness and contextual reasonableness of object placements. Our proposed method combines simulation-driven physical stability verification via real-to-sim and the semantic reasoning capability of large language models. In other words, given place context information (e.g., user preferences, object to place, and current scene information), our proposed method outputs a probability distribution over the possible placement candidates, considering the robustness and reasonableness of the place task. Our proposed method is extensively evaluated in two simulation and one real world environments and we show that our method can greatly increase the physical plausibility of the placement as well as contextual soundness while considering user preferences.Comment: 7 page

    INSTA-BEEER: Explicit Error Estimation and Refinement for Fast and Accurate Unseen Object Instance Segmentation

    Full text link
    Efficient and accurate segmentation of unseen objects is crucial for robotic manipulation. However, it remains challenging due to over- or under-segmentation. Although existing refinement methods can enhance the segmentation quality, they fix only minor boundary errors or are not sufficiently fast. In this work, we propose INSTAnce Boundary Explicit Error Estimation and Refinement (INSTA-BEEER), a novel refinement model that allows for adding and deleting instances and sharpening boundaries. Leveraging an error-estimation-then-refinement scheme, the model first estimates the pixel-wise boundary explicit errors: true positive, true negative, false positive, and false negative pixels of the instance boundary in the initial segmentation. It then refines the initial segmentation using these error estimates as guidance. Experiments show that the proposed model significantly enhances segmentation, achieving state-of-the-art performance. Furthermore, with a fast runtime (less than 0.1 s), the model consistently improves performance across various initial segmentation methods, making it highly suitable for practical robotic applications.Comment: 8 pages, 5 figure

    CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents

    Full text link
    In this paper, we focus on inferring whether the given user command is clear, ambiguous, or infeasible in the context of interactive robotic agents utilizing large language models (LLMs). To tackle this problem, we first present an uncertainty estimation method for LLMs to classify whether the command is certain (i.e., clear) or not (i.e., ambiguous or infeasible). Once the command is classified as uncertain, we further distinguish it between ambiguous or infeasible commands leveraging LLMs with situational aware context in a zero-shot manner. For ambiguous commands, we disambiguate the command by interacting with users via question generation with LLMs. We believe that proper recognition of the given commands could lead to a decrease in malfunction and undesired actions of the robot, enhancing the reliability of interactive robot agents. We present a dataset for robotic situational awareness, consisting pair of high-level commands, scene descriptions, and labels of command type (i.e., clear, ambiguous, or infeasible). We validate the proposed method on the collected dataset, pick-and-place tabletop simulation. Finally, we demonstrate the proposed approach in real-world human-robot interaction experiments, i.e., handover scenarios

    Nondestructive examination of chemical vapor infiltration of 0°/90° SiC/Nicalon composites

    No full text
    Ph.D.S. R. Stoc

    Similarity Graph-Based Camera Tracking for Effective 3D Geometry Reconstruction with Mobile RGB-D Camera

    No full text
    In this paper, we present a novel approach for reconstructing 3D geometry from a stream of images captured by a consumer-grade mobile RGB-D sensor. In contrast to previous real-time online approaches that process each incoming image in acquisition order, we show that applying a carefully selected order of (possibly a subset of) frames for pose estimation enables the performance of robust 3D reconstruction while automatically filtering out error-prone images. Our algorithm first organizes the input frames into a weighted graph called the similarity graph. A maximum spanning tree is then found in the graph, and its traversal determines the frames and their processing order. The basic algorithm is then extended by locally repairing the original spanning tree and merging disconnected tree components, if they exist, as much as possible, enhancing the result of 3D reconstruction. The capability of our method to generate a less error-prone stream from an input RGB-D stream may also be effectively combined with more sophisticated state-of-the-art techniques, which further increases their effectiveness in 3D reconstruction

    Semi-Autonomous Teleoperation via Learning Non-Prehensile Manipulation Skills

    Full text link
    In this paper, we present a semi-autonomous teleoperation framework for a pick-and-place task using an RGB-D sensor. In particular, we assume that the target object is located in a cluttered environment where both prehensile grasping and non-prehensile manipulation are combined for efficient teleoperation. A trajectory-based reinforcement learning is utilized for learning the non-prehensile manipulation to rearrange the objects for enabling direct grasping. From the depth image of the cluttered environment and the location of the goal object, the learned policy can provide multiple options of non-prehensile manipulation to the human operator. We carefully design a reward function for the rearranging task where the policy is trained in a simulational environment. Then, the trained policy is transferred to a real-world and evaluated in a number of real-world experiments with the varying number of objects where we show that the proposed method outperforms manual keyboard control in terms of the time duration for the grasping
    corecore